Aws Re:invent 2025 - Balance Cost, Performance & Reliability For Ai...
🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
📖 AWS re:Invent 2025 - Balance cost, performance & reliability for AI at enterprise scale (AIM3304)
In this video, Jared Dean, Ankur Desai, and Deepen Mehta discuss Amazon Bedrock's inference tier options for balancing cost, performance, and reliability at enterprise scale. They introduce four tiers using an airline analogy: Reserved (private plane) for mission-critical workloads with steady traffic, Priority (first class) for spiky latency-sensitive requests at premium pricing, Standard (economy plus) for day-to-day workloads tolerating some throttling, and Flex (basic economy) for latency-tolerant agentic workloads at 50% discount. Intuit's Deepen Mehta shares how they leverage these tiers—Reserved for TurboTax's seasonal traffic, Priority/Standard for daily spikes, and Flex for non-critical experiments. All tiers support explicit prompt caching with 90% discount on cached tokens. The session includes technical implementation details and CloudWatch monitoring capabilities for optimizing token usage across different workload patterns.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
The title of this session is "Balance Cost, Performance, and Reliability for AI at Enterprise Scale." I'm Jared Dean, a principal solutions architect with the Bedrock team. Joining me today are Ankur Desai, who's a principal product manager with Amazon Bedrock, and Deepen Mehta, who's a senior engineering manager for AI Foundations at Intuit. We're going to discuss the various aspects of building at enterprise scale and the choices and options available from recent releases in the Amazon Bedrock portfolio.
First, we'll do some introductions. Next, we'll provide an overview of options. Then we'll look at a customer experience, which I know is always beneficial to many of you. After that, we'll go through some technical details, and then we'll have a Q&A session at the end where we'll all come on stage and be able to answer your questions.
How many of you got to Las Vegas for the conference via airplane? Perfect. I'll use an analogy that I hope will resonate with al
Source: Dev.to