Tools

Tools: Why We Moved from GKE to EKS (2026)

2026-04-26 0 views admin

Why We Moved from GKE to EKS

⚠️ Why GKE Autopilot Started Falling Short

1. Cost Inefficiencies at Scale

2. Compliance and Governance Constraints

3. Limited Infrastructure Control

🎯 Why We Chose EKS + Karpenter

Full Infrastructure Control

Advanced Cost Optimization with Karpenter

Compliance Alignment

🏗️ Target Architecture

⚙️ Karpenter Setup — The Game Changer

📊 Results After Migration

💰 Cost

⚡ Performance

🔐 Compliance

⚠️ Challenges We Faced

1. Karpenter Learning Curve

2. Networking Model Differences

3. IAM Complexity

🧠 Key Lessons Learned

🏁 Conclusion When we initially adopted Kubernetes, Google Kubernetes Engine (GKE) Autopilot seemed like the perfect choice — fully managed, minimal operational overhead, and quick to get started. But as our workloads matured, three major challenges started to surface: This blog walks through why we migrated to Amazon Elastic Kubernetes Service (EKS) with Karpenter, the architectural changes we made, and the lessons we learned running production workloads post-migration. GKE Autopilot pricing is based on requested resources, not actual usage. This sounds fine at small scale — but as traffic grows, the gaps between requested and actual usage start to compound. Problems we observed: As traffic grew, costs increased almost linearly with no meaningful way to optimize without restructuring our entire workload configuration. Operating in a regulated environment required: With GKE Autopilot, several configurations are abstracted away or restricted by design. This made it harder to enforce organization-wide security policies and satisfy compliance requirements from auditors. Specifically: We needed something that gave us first-class integration with cloud-native IAM and security tooling — without layering on custom solutions. When performance-sensitive services started hitting bottlenecks, the inability to choose instance types became a real blocker. We had no control over: For teams running general-purpose workloads, this abstraction is a feature. For us, it was a ceiling. Moving to EKS gave us direct control over: This unlocked workload-specific performance tuning that simply wasn't possible before. Karpenter is not your traditional cluster autoscaler. Instead of scaling pre-defined node groups, it: The result: faster scaling reactions and a dramatically lower compute bill — without sacrificing reliability. AWS gave us the compliance story we needed: This made our next compliance audit significantly smoother. Auditors got clear, traceable logs without us having to build custom instrumentation. Here's what the high-level migration looked like architecturally: Karpenter replaced our traditional Cluster Autoscaler, and the difference was immediately visible. How we configured it: Key decisions we made: Compute costs dropped significantly. The main drivers: Being honest here — this is what makes a migration story actually useful. Karpenter's provisioning model is fundamentally different from Cluster Autoscaler. Debugging why a node wasn't provisioned — or why Karpenter chose a specific instance type — required understanding its internal decision logic. The logs are verbose but not always immediately readable. What helped: Running Karpenter in dry-run mode first, and adding structured logging to correlate provisioning decisions with pod events. GKE's VPC-native networking and AWS VPC behave differently in non-obvious ways — especially around CIDR planning, secondary IP ranges, and how pod IPs are allocated. We had to redesign our subnet layout and revisit some service-to-service communication assumptions. IRSA is powerful but requires careful role design. Mapping GCP Workload Identity bindings to AWS IRSA role assumptions took time, especially for services that assumed broad IAM permissions under GCP that needed to be tightened properly. GKE Autopilot is an excellent choice for teams that want Kubernetes without the operational overhead — and we'd still recommend it for that use case. But for production environments that require cost control at scale, fine-grained compliance posture, and workload-specific infrastructure decisions, EKS with Karpenter provided a more flexible and efficient platform. The migration wasn't trivial, but the control, visibility, and cost profile on the other side made it worth it. Have you gone through a similar migration? Or are you evaluating EKS vs GKE for your stack? Drop your questions in the comments — happy to dig into specifics. Tags: kubernetes aws devops cloud karpenter Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: spot-arm64 spec: template: metadata: labels: # ----------------------------------------------- # These labels land on the EC2 node. # Your pod affinity rules match against these. # ----------------------------------------------- node-pool: spot-arm64 capacity-type: spot arch: arm64 workload-class: standard spec: requirements: - key: kubernetes.io/arch operator: In values: ["arm64"] - key: kubernetes.io/os operator: In values: ["linux"] - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] # c6g, c7g, c8g — compute optimized Graviton # m6g, m7g, m8g — general purpose Graviton # r6g, r7g, r8g — memory optimized Graviton - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["5"] # Graviton2+ only (gen 6,7,8) nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default expireAfter: 168h # 7 days — shorter for spot nodes limits: cpu: 500 memory: 2000Gi disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 2m weight: 100 apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: spot-arm64 spec: template: metadata: labels: # ----------------------------------------------- # These labels land on the EC2 node. # Your pod affinity rules match against these. # ----------------------------------------------- node-pool: spot-arm64 capacity-type: spot arch: arm64 workload-class: standard spec: requirements: - key: kubernetes.io/arch operator: In values: ["arm64"] - key: kubernetes.io/os operator: In values: ["linux"] - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] # c6g, c7g, c8g — compute optimized Graviton # m6g, m7g, m8g — general purpose Graviton # r6g, r7g, r8g — memory optimized Graviton - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["5"] # Graviton2+ only (gen 6,7,8) nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default expireAfter: 168h # 7 days — shorter for spot nodes limits: cpu: 500 memory: 2000Gi disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 2m weight: 100 apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: spot-arm64 spec: template: metadata: labels: # ----------------------------------------------- # These labels land on the EC2 node. # Your pod affinity rules match against these. # ----------------------------------------------- node-pool: spot-arm64 capacity-type: spot arch: arm64 workload-class: standard spec: requirements: - key: kubernetes.io/arch operator: In values: ["arm64"] - key: kubernetes.io/os operator: In values: ["linux"] - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] # c6g, c7g, c8g — compute optimized Graviton # m6g, m7g, m8g — general purpose Graviton # r6g, r7g, r8g — memory optimized Graviton - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["5"] # Graviton2+ only (gen 6,7,8) nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default expireAfter: 168h # 7 days — shorter for spot nodes limits: cpu: 500 memory: 2000Gi disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 2m weight: 100 - Rising and unpredictable costs - Compliance constraints - The need for deeper infrastructure control - Over-provisioned workloads leading to higher bills - No access to Spot/Preemptible node strategies with the same level of flexibility - Very few cost optimization knobs to tune - Fine-grained IAM control at the workload level - Strict network isolation between services - Audit-level visibility into infrastructure activity - Enforcing per-pod IAM permissions cleanly was non-trivial - Network policy enforcement had gaps in our specific setup - Generating audit-ready logs tied to individual workload actions required workarounds - CPU vs. memory-optimized instance selection - ARM-based workloads on Graviton processors - Custom AMIs or low-level networking tuning - Instance families — CPU-optimized, memory-optimized, ARM (Graviton) - Custom AMIs — hardened images meeting our internal security baseline - Networking — VPC-native networking with fine-grained subnet and security group control - Watches for unschedulable pods in real time - Selects the right-sized instance based on actual pod requirements - Prioritizes Spot instances where workloads allow, falling back to On-Demand seamlessly - Bin-packs nodes efficiently, reducing idle capacity - IRSA (IAM Roles for Service Accounts) — precise, per-pod IAM permissions with no shared credentials - VPC-level isolation — full control over ingress, egress, and inter--weight: 500;">service communication - CloudTrail integration — every API call, every node action, fully auditable out of the box - AWS Config + Security Hub — continuous compliance checks against CIS benchmarks and custom rules - Spot-first provisioning — workloads that tolerate interruptions run on Spot; stateful services stay on On-Demand - Multiple instance families — Karpenter picks the cheapest right-sized option across families - Interruption handling — we use the Karpenter interruption queue (SQS) to gracefully drain Spot nodes before AWS reclaims them - consolidateAfter: 2m — nodes deprovision 2m seconds after going idle, eliminating ghost capacity - Spot instances covering the majority of our non-critical workloads - Karpenter's bin-packing eliminating idle node waste - Right-sized instances instead of over-provisioned static node groups - Faster pod scheduling — Karpenter provisions new nodes in under 60 seconds in most cases - Better workload isolation through custom node selectors and taints - Graviton (ARM) instances for compatible workloads gave us a meaningful price-performance improvement - Audit reports now generated directly from CloudTrail without custom tooling - IRSA eliminated shared IAM credential risks - Security Hub provides continuous posture monitoring against our compliance framework - Managed ≠ always optimal at scale. Autopilot is excellent for getting started, but production-grade platforms eventually need control surfaces that fully managed offerings deliberately hide. - Cost optimization requires infrastructure access. You can't tune what you can't see. - Autoscaling strategy matters more than cluster size. Karpenter's approach of provisioning for the pod rather than scaling a group changed how we think about capacity planning entirely. - Compliance is easier when the platform is designed for it. AWS's native compliance tooling removed a category of work that we were previously solving with custom scripts and log forwarding pipelines. - Migration should always be incremental. Parallel environment, gradual DNS cutover, canary deployments — this approach meant we caught issues in staging before they became production incidents.

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsmovedautopilotstartedfallinginefficiencies

More from Tools

Tools: How to Monitor Your Cron Jobs in Production (So They Don't Silently Die) - 2025 Update

2026-04-26 0

Tools: Latest: Kloak: interceptor eBPF que oculta secretos a tus pods en Kubernetes

2026-04-26 0

Tools: Report: The Definitive Guide to Laravel Deployment in 2026

2026-04-26 0

Tools: Serverless deployment with NEXUS AI - Expert Insights

2026-04-26 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Why We Moved from GKE to EKS (2026)

Why We Moved from GKE to EKS

⚠️ Why GKE Autopilot Started Falling Short

1. Cost Inefficiencies at Scale

2. Compliance and Governance Constraints

3. Limited Infrastructure Control

🎯 Why We Chose EKS + Karpenter

Full Infrastructure Control

Advanced Cost Optimization with Karpenter

Compliance Alignment

🏗️ Target Architecture

⚙️ Karpenter Setup — The Game Changer

📊 Results After Migration

💰 Cost

⚡ Performance

🔐 Compliance

⚠️ Challenges We Faced

1. Karpenter Learning Curve

2. Networking Model Differences

3. IAM Complexity

🧠 Key Lessons Learned

🏷️ Tags

More from Tools

Tools: How to Monitor Your Cron Jobs in Production (So They Don't Silently Die) - 2025 Update

Tools: Latest: Kloak: interceptor eBPF que oculta secretos a tus pods en Kubernetes

Tools: Report: The Definitive Guide to Laravel Deployment in 2026

Tools: Serverless deployment with NEXUS AI - Expert Insights

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting