Tools: I Built a Bot That Updates My EKS Nodes While I Sleep — Here's How - Analysis

Tools: I Built a Bot That Updates My EKS Nodes While I Sleep — Here's How - Analysis

The problem every EKS team hits eventually

The solution in one sentence

Architecture: Three clean phases

Phase 1 — Detection

Phase 2 — AI Analysis + Pull Request

Phase 3 — GitOps Deployment

What the PR actually looks like

CloudFormation: everything in one stack

Prerequisites checklist

Testing it without waiting for an AMI release

Common issues and how to fix them

Why this architecture is worth copying

Cleaning up

What I'd add next

Get the code

Wrapping up TL;DR: Manual EKS AMI updates are slow, risky, and easy to forget. I wired together EventBridge, Lambda, Amazon Bedrock (Claude 3.5 Haiku), GitHub PRs, ArgoCD, and Karpenter into a pipeline that detects new AMIs, runs AI risk analysis, opens a PR for human review, and rolls out nodes automatically — zero downtime, full audit trail. You're running production Kubernetes on AWS. You know you're supposed to keep worker nodes patched. But between sprints, incidents, and everything else — checking for new EKS-optimized AMIs falls through the cracks. When you finally do an update, there's a whole ritual: find the new AMI ID, read through the release notes, assess any CVEs, draft a PR, wait for approvals, then carefully roll out nodes without taking down your workloads. It's not rocket science — it's just slow, manual, and one of those tasks that always feels lower priority than the thing currently on fire. What if the whole thing ran itself? Twice a day, a Lambda checks for new EKS AMIs. If one exists, Bedrock analyzes the risk and opens a GitHub PR. A human reviews it. Merging the PR triggers ArgoCD + Karpenter to roll out the new nodes with zero downtime. The magic is that the only thing a human needs to do is read the AI's analysis and merge (or close) the PR. Everything else — detection, analysis, branch creation, notification, node rollout — is automated. An EventBridge scheduled rule fires at 9 AM and 9 PM UTC every day. It triggers a Lambda that: No new AMI? The Lambda exits quietly. Nothing else happens. This is where it gets interesting. AWS Step Functions orchestrates three Lambda functions in sequence: Lambda 1 — bedrock-analyzer Fetches the real AMI release notes from GitHub (awslabs/amazon-eks-ami) and sends them to Amazon Bedrock running Claude 3.5 Haiku with this prompt: The output is a structured JSON object with a risk score and a ready-to-paste PR description. Lambda 2 — gitops-updater Uses GitHub App credentials (stored in AWS Secrets Manager) to: Lambda 3 — send-notification Fires an SNS email to the team: "New AMI detected, PR #N is open for your review." Includes the PR link and the one-line AI summary. The human's job: Read the AI analysis. Check the YAML diff (it's literally one line — the AMI ID). Merge to approve, close to reject. After the PR is merged: The whole rollout happens without anyone touching kubectl. This is what your team sees in GitHub: Your reviewer doesn't need to dig through release notes. The AI already did it. The whole solution deploys from a single CloudFormation template. Here's what it provisions: Takes about 2–3 minutes. Confirm the SNS subscription email when it arrives. Before deploying, you need: Important: Fork the aws-samples repository to your own account — you need write access to configure the GitHub App. Deploy your EC2NodeClass config to the repo before running the stack. Don't want to wait up to 12 hours for the schedule to fire? Trigger it manually: Check your inbox. You should get an SNS email with the risk analysis and PR link within a couple of minutes. After merging, verify the ArgoCD sync: SNS subscription not confirmed — Check your spam folder. The confirmation email comes from AWS and sometimes gets filtered. GitHub App auth failure — Double-check the App is installed on the correct repository with read/write permissions. Regenerate the private key in GitHub if needed and re-run the CloudFormation update. Bedrock access denied — Go to the Amazon Bedrock console → Model access → enable Claude 3.5 Haiku in your region. This is a manual step that's easy to miss. ArgoCD not syncing — Verify the Application resource has spec.syncPolicy.automated set. Check that the repo URL and path match exactly. Step Functions failures — Check CloudWatch Logs for the failing Lambda. 99% of the time it's an IAM permission issue or a missing secret. A few design decisions I want to highlight: GitHub PRs as the approval interface — Engineers already live in GitHub. Using a PR as the human gate means no new tool to learn, built-in commenting, and a permanent audit trail in Git history. The PR description IS the change record. AI analysis on real release notes — The Bedrock prompt fetches actual release notes from the awslabs/amazon-eks-ami repo. It's not making things up — it's summarizing real content. The risk score is grounded in actual CVE and package data. Karpenter over managed node groups — Karpenter watches the EC2NodeClass for changes and handles the node lifecycle automatically. You don't need to write any drain/cordon scripts. Least-privilege IAM — Each Lambda has its own role with only the permissions it needs. The CF template provisions five separate roles. This matters in production. Guardrails on Bedrock — The solution includes a Bedrock Guardrail for content filtering on the AI output. Belt and suspenders. A few things that would make this even better: Fork the repo, follow the README, and deploy: 👉 GitHub: suryansh639/sample-eks-ami-gitops-pipeline The CloudFormation template, Lambda code, and example Karpenter configs are all there. The goal wasn't to remove humans from the loop — it was to remove the boring part of the loop. The AI reads the release notes. The AI writes the PR description. The human decides. The automation executes. That's the right split. And it means your nodes actually get updated on time, every time, with a full audit trail and no 2 AM surprises. If you try this out, drop a comment — I'd love to hear what customizations you make. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

Analyze this Amazon EKS AMI update using the actual release notes. New AMI ID: {ami_id} Previous AMI ID: {previous_ami} ACTUAL EKS AMI RELEASE NOTES: {release_notes} Respond in JSON with: - risk_score: 1–10 - recommendation: APPROVE or REJECT - summary: one-line summary of actual changes - pr_description: full markdown PR body with CVEs, package versions, risk assessment, and review guidance Analyze this Amazon EKS AMI update using the actual release notes. New AMI ID: {ami_id} Previous AMI ID: {previous_ami} ACTUAL EKS AMI RELEASE NOTES: {release_notes} Respond in JSON with: - risk_score: 1–10 - recommendation: APPROVE or REJECT - summary: one-line summary of actual changes - pr_description: full markdown PR body with CVEs, package versions, risk assessment, and review guidance Analyze this Amazon EKS AMI update using the actual release notes. New AMI ID: {ami_id} Previous AMI ID: {previous_ami} ACTUAL EKS AMI RELEASE NOTES: {release_notes} Respond in JSON with: - risk_score: 1–10 - recommendation: APPROVE or REJECT - summary: one-line summary of actual changes - pr_description: full markdown PR body with CVEs, package versions, risk assessment, and review guidance

EKS AMI Update — ami-04b406d4e6eaca578 **AI Risk Score: 2/10 — APPROVE**

What changed

- Go updated to 1.25.9- Kernel updated to 6.12.79-101.147.amzn2023- No new CVEs introduced

CVE AssessmentNo critical or high-severity CVEs in this update. Two previouslyknown CVEs (CVE-2024-XXXX, CVE-2024-YYYY) are patched.

Review guidanceThis is a routine kernel + runtime update. Low risk. Recommendmerging during business hours with normal monitoring in place. ---*Merge this PR to trigger ArgoCD + Karpenter rollout.*

*Close this PR to skip this AMI version.*

Command

Copy

$

EKS AMI Update — ami-04b406d4e6eaca578 **AI Risk Score: 2/10 — APPROVE**

What changed

- Go updated to 1.25.9- Kernel updated to 6.12.79-101.147.amzn2023- No new CVEs introduced

CVE AssessmentNo critical or high-severity CVEs in this update. Two previouslyknown CVEs (CVE-2024-XXXX, CVE-2024-YYYY) are patched.

Review guidanceThis is a routine kernel + runtime update. Low risk. Recommendmerging during business hours with normal monitoring in place. ---*Merge this PR to trigger ArgoCD + Karpenter rollout.*

*Close this PR to skip this AMI version.*

Command

Copy

$

EKS AMI Update — ami-04b406d4e6eaca578 **AI Risk Score: 2/10 — APPROVE**

What changed

- Go updated to 1.25.9- Kernel updated to 6.12.79-101.147.amzn2023- No new CVEs introduced

CVE AssessmentNo critical or high-severity CVEs in this update. Two previouslyknown CVEs (CVE-2024-XXXX, CVE-2024-YYYY) are patched.

Review guidanceThis is a routine kernel + runtime update. Low risk. Recommendmerging during business hours with normal monitoring in place. ---*Merge this PR to trigger ArgoCD + Karpenter rollout.*

*Close this PR to skip this AMI version.*

Command

Copy

$ aws cloudformation create-stack \ --stack-name eks-ami--weight: 500;">update \ --template-body file://cloudformation-template.yaml \ --capabilities CAPABILITY_NAMED_IAM \ --parameters \ ParameterKey=NotificationEmail,[email protected] \ ParameterKey=GitHubAppId,ParameterValue=<app-id> \ ParameterKey=GitHubAppInstallationId,ParameterValue=<-weight: 500;">install-id> \ ParameterKey=GitHubAppPrivateKey,ParameterValue=$(base64 -i app.pem | tr -d '\n') \ ParameterKey=GitHubRepoOwner,ParameterValue=<your-org> \ ParameterKey=GitHubRepoName,ParameterValue=<your-repo> \ ParameterKey=GitHubFilePath,ParameterValue=karpenter-configs/clusters/your-cluster/nodeclass.yaml \ ParameterKey=GitHubBranch,ParameterValue=main \ ParameterKey=EKSVersion,ParameterValue=1.34 aws cloudformation create-stack \ --stack-name eks-ami--weight: 500;">update \ --template-body file://cloudformation-template.yaml \ --capabilities CAPABILITY_NAMED_IAM \ --parameters \ ParameterKey=NotificationEmail,[email protected] \ ParameterKey=GitHubAppId,ParameterValue=<app-id> \ ParameterKey=GitHubAppInstallationId,ParameterValue=<-weight: 500;">install-id> \ ParameterKey=GitHubAppPrivateKey,ParameterValue=$(base64 -i app.pem | tr -d '\n') \ ParameterKey=GitHubRepoOwner,ParameterValue=<your-org> \ ParameterKey=GitHubRepoName,ParameterValue=<your-repo> \ ParameterKey=GitHubFilePath,ParameterValue=karpenter-configs/clusters/your-cluster/nodeclass.yaml \ ParameterKey=GitHubBranch,ParameterValue=main \ ParameterKey=EKSVersion,ParameterValue=1.34 aws cloudformation create-stack \ --stack-name eks-ami--weight: 500;">update \ --template-body file://cloudformation-template.yaml \ --capabilities CAPABILITY_NAMED_IAM \ --parameters \ ParameterKey=NotificationEmail,[email protected] \ ParameterKey=GitHubAppId,ParameterValue=<app-id> \ ParameterKey=GitHubAppInstallationId,ParameterValue=<-weight: 500;">install-id> \ ParameterKey=GitHubAppPrivateKey,ParameterValue=$(base64 -i app.pem | tr -d '\n') \ ParameterKey=GitHubRepoOwner,ParameterValue=<your-org> \ ParameterKey=GitHubRepoName,ParameterValue=<your-repo> \ ParameterKey=GitHubFilePath,ParameterValue=karpenter-configs/clusters/your-cluster/nodeclass.yaml \ ParameterKey=GitHubBranch,ParameterValue=main \ ParameterKey=EKSVersion,ParameterValue=1.34 aws lambda invoke \ --function-name eks-ami-detector \ --payload '{}' \ --cli-binary-format raw-in-base64-out \ /tmp/response.json && cat /tmp/response.json aws lambda invoke \ --function-name eks-ami-detector \ --payload '{}' \ --cli-binary-format raw-in-base64-out \ /tmp/response.json && cat /tmp/response.json aws lambda invoke \ --function-name eks-ami-detector \ --payload '{}' \ --cli-binary-format raw-in-base64-out \ /tmp/response.json && cat /tmp/response.json # Update your kubeconfig aws eks -weight: 500;">update-kubeconfig --region <region> --name <cluster-name> # Check ArgoCD sync policy -weight: 500;">kubectl get application karpenter-nodeclass -n argocd \ -o jsonpath='{.spec.syncPolicy}' # Verify the AMI ID was applied -weight: 500;">kubectl get ec2nodeclass default -o yaml | grep ami- # Update your kubeconfig aws eks -weight: 500;">update-kubeconfig --region <region> --name <cluster-name> # Check ArgoCD sync policy -weight: 500;">kubectl get application karpenter-nodeclass -n argocd \ -o jsonpath='{.spec.syncPolicy}' # Verify the AMI ID was applied -weight: 500;">kubectl get ec2nodeclass default -o yaml | grep ami- # Update your kubeconfig aws eks -weight: 500;">update-kubeconfig --region <region> --name <cluster-name> # Check ArgoCD sync policy -weight: 500;">kubectl get application karpenter-nodeclass -n argocd \ -o jsonpath='{.spec.syncPolicy}' # Verify the AMI ID was applied -weight: 500;">kubectl get ec2nodeclass default -o yaml | grep ami- aws cloudformation delete-stack --stack-name eks-ami--weight: 500;">update aws cloudformation delete-stack --stack-name eks-ami--weight: 500;">update aws cloudformation delete-stack --stack-name eks-ami--weight: 500;">update - Queries AWS SSM Parameter Store for the latest EKS-optimized AMI ID (/aws/-weight: 500;">service/eks/optimized-ami/1.34/amazon-linux-2023/recommended/image_id) - Compares it against what's currently committed in your GitHub repository (your source of truth) - If they differ — new AMI exists → triggers the Step Functions workflow - Create a new branch - Update the Karpenter EC2NodeClass YAML with the new AMI ID - Open a Pull Request with the full Bedrock analysis embedded in the description - ArgoCD detects the commit on main, auto-syncs the updated EC2NodeClass manifest to the EKS cluster - Karpenter sees the new AMI ID in the EC2NodeClass, provisions new EC2 nodes with the updated AMI, then gracefully drains the old nodes - Workloads migrate to new nodes. Zero downtime. - [ ] An existing EKS cluster (v1.34+) - [ ] Karpenter installed and configured - [ ] ArgoCD installed with auto-sync enabled - [ ] A GitHub repository for Karpenter configs - [ ] A GitHub App installed on that repo (you need App ID, Installation ID, and Private Key) - [ ] Amazon Bedrock enabled in your region (-weight: 500;">enable Claude 3.5 Haiku access in the Bedrock console) - [ ] AWS CLI + -weight: 500;">kubectl configured - Slack notification instead of (or in addition to) SNS email — PR link directly in your #platform channel - Dry-run mode — run the full pipeline but don't actually open a PR, just log the analysis - Multi-cluster support — one stack managing AMI updates across dev/staging/prod with different approval thresholds per environment - Custom risk criteria — tune the Bedrock prompt to your org's specific compliance requirements (PCI-DSS, SOC 2, etc.) - Automatic REJECT on critical CVEs — skip the PR entirely and alert the team if the risk score is 8+