Aws Re:invent 2025 - Under The Hood: Architecting Amazon Eks For...
🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
📖 AWS re:Invent 2025 - Under the hood: Architecting Amazon EKS for scale and performance (CNS429)
In this video, AWS introduces Amazon EKS Ultra-scale Clusters supporting up to 100,000 nodes and 800,000 NVIDIA GPUs in a single cluster. The team explains architectural innovations including a reimagined etcd data store with offloaded consensus to a multi-AZ transaction journal, in-memory database using TEFS, and intelligent key partitioning. They also announce provisioned control plane with three new performance tiers (XL, 2XL, 4XL) offering up to 16GB etcd capacity and predictable high performance. Anthropic's Nova DasSarma shares how they run Claude model training on EKS ultra-scale clusters, using custom schedulers like Cartographer for workload-level scheduling and achieving 5000 GB/s throughput with S3. Performance improvements include 3X faster pod startup with AWS SOCI parallel pull and optimized networking with CNI enhancements.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Good morning, everybody. Welcome to CNS 429, Under the Hood: Architecting Amazon EKS for Scale and Performance. I am Sheetal Joshi, Principal Specialist Solutions Architect for Amazon EKS here at AWS. I've been here coming up on the five-year mark. I'm going to be joined by Raghav in a few minutes here, and completing our lineup is Nova DasSarma from Anthropic, who is a Member of Technical Staff and also Head of Infrastructure at Anthropic. I just envy her job. She makes all these cool things possible, training these large cloud models at Anthropic. We are just excited to have her here on the stage with us today.
As you can see here, Kubernetes since its launch has revolutionized how applications are built and deployed. In a recent survey of Kubernetes, as you can see, 93% of companies are either running it in production, evaluating it, or piloting it. Kubernetes is basically the declarative form that makes it easier for infrastructure management and became the de facto standard for deploying applications in cloud native environments.
Moving forward to Amazon EKS, w
Source: Dev.to