Aws Re:invent 2025 - Building Scalable Applications With Text And...

Aws Re:invent 2025 - Building Scalable Applications With Text And...

🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

📖 AWS re:Invent 2025 - Building scalable applications with text and multimodal understanding (AIM375)

In this video, Amazon AGI introduces Amazon Nova 2.0 multimodal foundation models that process text, images, videos, audio, and speech natively. The session covers three key areas: document intelligence with optimized OCR and key information extraction, image and video understanding with temporal awareness and reasoning capabilities, and Amazon Nova Multimodal Embeddings for cross-modal search across all content types. Box's Tyan Hynes demonstrates real-world applications, including automated materials testing report analysis for engineering firms and continuity checks for production studios, showcasing how the 1 million token context window and native multimodal processing eliminate the need for separate models and manual annotation workflows.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Good morning, everyone. First of all, thank you for being here. As a quick introduction, I'm Dinesh Rajput, a Principal Product Manager with Amazon AGI. AGI stands for Artificial General Intelligence, an organization within Amazon that builds first-party foundation models called Amazon Nova. Today's session will discuss how you can utilize data beyond just text, such as images, documents, videos, audio, and call recordings to build accurate, context-aware applications using Amazon Nova Foundation models. I'm also joined by my colleague Brandon Nair, who will discuss image and video understanding, and we have one of our customers, Tyan Hynes, who represents Box and will share how they use Amazon Nova models to improve their AI workflows.

This is the broad agenda for today. First, we'll discuss the enterprise needs when it comes to multimodal data and the different challenges that customers face. Then we'll provide a quick overview of Amazon Nova, two models that we introduced yesterday. We'll do a deep dive on how we've optimized these models for document intelligence use cases as well as visual reasoning use cases. Finally, we'll discuss multimodal embedding

Source: Dev.to