Aws Re:invent 2025 - Improve Agent Quality In Production With...
🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
📖 AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)
In this video, Amanda Lester, Vivek Singh, and Ishan Singh introduce Amazon Bedrock AgentCore Evaluations, a fully managed solution for continuous AI agent quality assessment. They explain how agents' non-deterministic nature creates trust gaps and demonstrate how AgentCore Evaluations addresses this with 13 built-in evaluators across quality dimensions like correctness, helpfulness, and tool usage, plus custom evaluator capabilities. The session covers two evaluation modes: online evaluations for continuous production monitoring and on-demand evaluations for CI/CD pipelines. Using a Wanderlust Travel Platform example, they show how the service detected tool selection accuracy dropping from 0.91 to 0.3, enabling rapid diagnosis through detailed explanations. Live demos illustrate the complete workflow from baseline testing to production deployment, emphasizing multi-dimensional success criteria and rigorous statistical analysis as best practices.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Hello everyone and welcome to Amazon re:Invent. It's great to have you all here. My name is Amanda Lester and I am the worldwide go-to-market leader for Amazon Bedrock AgentCore, and I am joined today by two of my esteemed colleagues, Vivek Singh, Senior Technical Product Manager for AgentCore, and Ishan Singh, Senior GenAI Data Scientist here at AWS. We are incredibly excited to be able to present to you today.
We're going to discuss how you can improve the quality of your agents in production with Amazon Bedrock AgentCore Evaluations, which we just recently launched during the keynote. We're incredibly excited to present to you what we have developed for agent evaluations, which we believe is going to fundamentally help you to improve the way that you do business. In today's session, you're going to learn a couple of things.
First, you're going to learn about Amazon Bedrock AgentCore. We're also going to discuss some of the key fundamental challenges
Source: Dev.to