Tools: Building an AI-Native Retail Platform on GCP: Personalization + Multi-Agent Ops + Agentic RAG as One Unified Stack

Tools: Building an AI-Native Retail Platform on GCP: Personalization + Multi-Agent Ops + Agentic RAG as One Unified Stack

πŸ—οΈ The Three Layers of an AI-Native Retail Platform

πŸ“ Unified Architecture Overview

🎯 Layer 1: Real-Time Personalization Engine

The Core Problem

The Six-Stage Pipeline

Handling Cold Start

πŸ€– Layer 2: Multi-Agent Operations

The Core Problem

Agent Architecture

A Reorder Request β€” Traced End-to-End

The Pub/Sub Design β€” Why It Matters

Shared Memory: The agent_decision_log Table

πŸ“š Layer 3: Agentic RAG for Retail Knowledge

The Core Problem

Three Retrieval Sources

Query Decomposition in Action

The Self-Correction Loop

πŸ”— How the Three Layers Connect

πŸ“Š Observability β€” One Dashboard, Three Layers

πŸš€ Where to Start

πŸ’‘ Key Takeaways A shopper searches for rain boots on your storefront. Within 120ms, your personalization engine surfaces the right products. A stock alert fires, and three AI agents coordinate a reorder without a human touching a keyboard. The customer asks a question in chat β€” the answer comes back grounded in live inventory and your return policy, cited and accurate. This is not three separate AI projects. It is one unified platform β€” and this article shows you how to build it on GCP. Most retail AI initiatives start with one use case and stop there. What makes a platform is when these three capabilities are designed together, sharing infrastructure and data: The key insight: all three layers share the same data backbone β€” BigQuery as the source of truth, Pub/Sub as the event spine, and Vertex AI as the intelligence layer. Daily batch recommendations ignore the most powerful signal available: what the user is doing right now. A shopper who just added rain boots to their cart does not want yesterday's trending sneakers. Design principle: Personalization is a retrieval problem. Given a user and their context right now, find the items most likely to convert β€” in under 120ms. Stage 1 β€” Event Capture (Pub/Sub) Every user interaction fires a structured event to Pub/Sub. The client SDK is fire-and-forget β€” it does not wait for a response. Stage 2 β€” Stream Enrichment (Dataflow) A Dataflow streaming job picks up events, joins with item metadata from BigQuery, and writes two outputs: Stage 3 β€” Feature Assembly (Vertex AI Feature Store) At query time, three feature groups are fetched in a single low-latency call: Stage 4 β€” ANN Retrieval (Vertex AI Matching Engine) The assembled user context vector is submitted to Matching Engine β€” Google's managed ANN index. It returns the top 50 candidate SKUs from a catalog of millions in under 10ms. Under the hood: Google's ScaNN algorithm, pre-filtered by in-stock status so the re-ranker never sees unavailable items. Stage 5 β€” Re-Ranking (Vertex AI Prediction) A lightweight model re-scores the 50 candidates using signals the embedding index cannot capture: Stage 6 β€” Serve (Cloud Run) Top 10 results + display metadata returned to the frontend. End-to-end: < 120ms at p99. A single LLM handling all retail operations hits three walls: context overload, sequential latency, and unmaintainable prompts. When the inventory rule, pricing model, supplier contract, and customer policy all need to fit in one context β€” reasoning quality degrades. Design principle: Treat operations like a well-run team. One orchestrator receives requests and coordinates specialists. Each specialist does one thing well. Notice: the Customer Agent IS Layer 3 β€” Agentic RAG is not separate, it is the intelligence layer of the Customer Agent. This is where the three layers connect. Input: "Should we reorder SKU-991?" Step 1 β€” Decompose: Orchestrator identifies three parallel sub-tasks. Step 2 β€” Dispatch: All three tasks published to Pub/Sub simultaneously. Step 3 β€” Execute in Parallel: Each Cloud Run agent handles its task independently: Total time = max(slowest agent) β€” not the sum of all three. Three properties you get for free: Every orchestrated request is fully logged: This table powers weekly evaluation reports and feeds back into model fine-tuning β€” your audit trail is also your training dataset. Standard RAG (embed query β†’ retrieve chunks β†’ generate) fails retail because: Design principle: RAG should reason, not just retrieve. The agent decides which source to query, validates the result, and cites its sources. 1. Policy & Compliance Index (Vertex AI Search) Return policies, warranty terms, BOPIS rules, hazmat shipping. Indexed as documents with hybrid retrieval (dense semantic + sparse BM25 keyword). BM25 matters here: product part numbers and model codes are not well-served by pure vector search. Hybrid retrieval handles both. 2. Product Catalog Index (Vertex AI Search) Product descriptions, specs, compatibility notes, sizing guides. Indexed with multimodal embeddings (text + image) so "waterproof jacket similar to this one" works. 3. Live Operational Data (BigQuery as a Tool) Inventory levels, order status, real-time pricing β€” not indexed as documents but called as a live tool. This is the key architectural decision that prevents stale answers. Customer query: "Can I return the 40V battery I bought online at a store, and is it in stock at the Cumming, GA location?" Agent validates Sub-query A: relevance score > 0.82 threshold βœ… Agent validates Sub-query B: live data, timestamp 2 minutes ago βœ… Every fact is cited. No hallucination. No "please check the website." This loop means your system knows what it does not know β€” and routes accordingly. The platform is unified, not assembled. Here is how data and events flow across all three layers in a single customer session: The feedback loop is the platform. Every interaction trains the next version of every model. All three layers write to BigQuery. One Looker Studio dashboard covers the full platform: When retrieval precision drops, you know before customers notice. Don't try to ship all three layers at once. Here is a proven sequencing: Week 1–4: Lay the data foundation Week 5–8: Ship Personalization (Layer 1) Week 9–12: Add Multi-Agent Ops (Layer 2) Week 13–16: Add Agentic RAG (Layer 3) The Pub/Sub bus means each new layer plugs in without touching what already works. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or

Code Block

Copy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FRONTEND / API GATEWAY β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PERSONALI- β”‚ β”‚ MULTI-AGENT β”‚ β”‚ AGENTIC RAG β”‚ β”‚ ZATION β”‚ β”‚ ORCHESTRATOR β”‚ β”‚ (Customer Q&A) β”‚ β”‚ ENGINE β”‚ β”‚ (Gemini 1.5) β”‚ β”‚ (Gemini + β”‚ β”‚ (Cloud Run) β”‚ β”‚ (Vertex AI β”‚ β”‚ Vertex Search) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ Reasoning) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ GOOGLE CLOUD PUB/SUB β”‚ β”‚ (Shared Event Spine) β”‚ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Dataflow β”‚ β”‚Specialistβ”‚ β”‚ Vertex AI β”‚ β”‚ Streaming β”‚ β”‚ Agents β”‚ β”‚ Search Index β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β” β”‚ BIGQUERY β”‚ β”‚ (Shared Operational Store) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ CODE_BLOCK: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FRONTEND / API GATEWAY β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PERSONALI- β”‚ β”‚ MULTI-AGENT β”‚ β”‚ AGENTIC RAG β”‚ β”‚ ZATION β”‚ β”‚ ORCHESTRATOR β”‚ β”‚ (Customer Q&A) β”‚ β”‚ ENGINE β”‚ β”‚ (Gemini 1.5) β”‚ β”‚ (Gemini + β”‚ β”‚ (Cloud Run) β”‚ β”‚ (Vertex AI β”‚ β”‚ Vertex Search) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ Reasoning) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ GOOGLE CLOUD PUB/SUB β”‚ β”‚ (Shared Event Spine) β”‚ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Dataflow β”‚ β”‚Specialistβ”‚ β”‚ Vertex AI β”‚ β”‚ Streaming β”‚ β”‚ Agents β”‚ β”‚ Search Index β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β” β”‚ BIGQUERY β”‚ β”‚ (Shared Operational Store) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ CODE_BLOCK: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FRONTEND / API GATEWAY β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PERSONALI- β”‚ β”‚ MULTI-AGENT β”‚ β”‚ AGENTIC RAG β”‚ β”‚ ZATION β”‚ β”‚ ORCHESTRATOR β”‚ β”‚ (Customer Q&A) β”‚ β”‚ ENGINE β”‚ β”‚ (Gemini 1.5) β”‚ β”‚ (Gemini + β”‚ β”‚ (Cloud Run) β”‚ β”‚ (Vertex AI β”‚ β”‚ Vertex Search) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ Reasoning) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ GOOGLE CLOUD PUB/SUB β”‚ β”‚ (Shared Event Spine) β”‚ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Dataflow β”‚ β”‚Specialistβ”‚ β”‚ Vertex AI β”‚ β”‚ Streaming β”‚ β”‚ Agents β”‚ β”‚ Search Index β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β” β”‚ BIGQUERY β”‚ β”‚ (Shared Operational Store) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ CODE_BLOCK: { "event_type": "CART_ADD", "user_id": "u_8821", "sku_id": "SKU-4471", "session_id": "s_992abc", "ts": "2026-03-22T14:03:11Z", "context": { "device": "mobile", "location": "Atlanta, GA" } } CODE_BLOCK: { "event_type": "CART_ADD", "user_id": "u_8821", "sku_id": "SKU-4471", "session_id": "s_992abc", "ts": "2026-03-22T14:03:11Z", "context": { "device": "mobile", "location": "Atlanta, GA" } } CODE_BLOCK: { "event_type": "CART_ADD", "user_id": "u_8821", "sku_id": "SKU-4471", "session_id": "s_992abc", "ts": "2026-03-22T14:03:11Z", "context": { "device": "mobile", "location": "Atlanta, GA" } } CODE_BLOCK: feature_store_client.read_feature_values( entity_type="user", entity_ids=[user_id], feature_selector={ "id_matcher": { "ids": ["purchase_history", "session_clicks", "device_type", "location"] } } ) CODE_BLOCK: feature_store_client.read_feature_values( entity_type="user", entity_ids=[user_id], feature_selector={ "id_matcher": { "ids": ["purchase_history", "session_clicks", "device_type", "location"] } } ) CODE_BLOCK: feature_store_client.read_feature_values( entity_type="user", entity_ids=[user_id], feature_selector={ "id_matcher": { "ids": ["purchase_history", "session_clicks", "device_type", "location"] } } ) CODE_BLOCK: response = index_endpoint.find_neighbors( deployed_index_id="retail_item_embeddings", queries=[user_context_vector], num_neighbors=50 ) CODE_BLOCK: response = index_endpoint.find_neighbors( deployed_index_id="retail_item_embeddings", queries=[user_context_vector], num_neighbors=50 ) CODE_BLOCK: response = index_endpoint.find_neighbors( deployed_index_id="retail_item_embeddings", queries=[user_context_vector], num_neighbors=50 ) CODE_BLOCK: Operator / System Trigger β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ORCHESTRATOR AGENT β”‚ β”‚ Gemini 1.5 Pro β”‚ β”‚ Vertex AI Reasoning Engine β”‚ β”‚ - Decomposes tasks β”‚ β”‚ - Routes to specialists β”‚ β”‚ - Synthesizes final response β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ Pub/Sub β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Inventoryβ”‚ β”‚Pricing β”‚ β”‚Supplier β”‚ β”‚Customer β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚BigQuery β”‚ β”‚BQ ML β”‚ β”‚Vertex AI β”‚ β”‚Agentic β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Search β”‚ β”‚RAG ←────── Layer 3 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ CODE_BLOCK: Operator / System Trigger β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ORCHESTRATOR AGENT β”‚ β”‚ Gemini 1.5 Pro β”‚ β”‚ Vertex AI Reasoning Engine β”‚ β”‚ - Decomposes tasks β”‚ β”‚ - Routes to specialists β”‚ β”‚ - Synthesizes final response β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ Pub/Sub β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Inventoryβ”‚ β”‚Pricing β”‚ β”‚Supplier β”‚ β”‚Customer β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚BigQuery β”‚ β”‚BQ ML β”‚ β”‚Vertex AI β”‚ β”‚Agentic β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Search β”‚ β”‚RAG ←────── Layer 3 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ CODE_BLOCK: Operator / System Trigger β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ORCHESTRATOR AGENT β”‚ β”‚ Gemini 1.5 Pro β”‚ β”‚ Vertex AI Reasoning Engine β”‚ β”‚ - Decomposes tasks β”‚ β”‚ - Routes to specialists β”‚ β”‚ - Synthesizes final response β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ Pub/Sub β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Inventoryβ”‚ β”‚Pricing β”‚ β”‚Supplier β”‚ β”‚Customer β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚Agent β”‚ β”‚BigQuery β”‚ β”‚BQ ML β”‚ β”‚Vertex AI β”‚ β”‚Agentic β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Search β”‚ β”‚RAG ←────── Layer 3 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ COMMAND_BLOCK: tasks = orchestrator.decompose(query) # β†’ [ # {"agent": "inventory", "task": "get_stock_level", "sku": "SKU-991"}, # {"agent": "supplier", "task": "get_eta_and_cost", "sku": "SKU-991"}, # {"agent": "pricing", "task": "get_reorder_cost", "sku": "SKU-991"} # ] COMMAND_BLOCK: tasks = orchestrator.decompose(query) # β†’ [ # {"agent": "inventory", "task": "get_stock_level", "sku": "SKU-991"}, # {"agent": "supplier", "task": "get_eta_and_cost", "sku": "SKU-991"}, # {"agent": "pricing", "task": "get_reorder_cost", "sku": "SKU-991"} # ] COMMAND_BLOCK: tasks = orchestrator.decompose(query) # β†’ [ # {"agent": "inventory", "task": "get_stock_level", "sku": "SKU-991"}, # {"agent": "supplier", "task": "get_eta_and_cost", "sku": "SKU-991"}, # {"agent": "pricing", "task": "get_reorder_cost", "sku": "SKU-991"} # ] COMMAND_BLOCK: # Inventory Agent stock = bq_client.query(""" SELECT units_available FROM inventory_snapshot WHERE sku_id = 'SKU-991' AND store_id = 'DC-ATL' """).result() # Pricing Agent (BigQuery ML) reorder_cost = bq_client.query(""" SELECT ML.PREDICT(MODEL `retail.pricing_model`, (SELECT * FROM pricing_signals WHERE sku_id = 'SKU-991')) """).result() COMMAND_BLOCK: # Inventory Agent stock = bq_client.query(""" SELECT units_available FROM inventory_snapshot WHERE sku_id = 'SKU-991' AND store_id = 'DC-ATL' """).result() # Pricing Agent (BigQuery ML) reorder_cost = bq_client.query(""" SELECT ML.PREDICT(MODEL `retail.pricing_model`, (SELECT * FROM pricing_signals WHERE sku_id = 'SKU-991')) """).result() COMMAND_BLOCK: # Inventory Agent stock = bq_client.query(""" SELECT units_available FROM inventory_snapshot WHERE sku_id = 'SKU-991' AND store_id = 'DC-ATL' """).result() # Pricing Agent (BigQuery ML) reorder_cost = bq_client.query(""" SELECT ML.PREDICT(MODEL `retail.pricing_model`, (SELECT * FROM pricing_signals WHERE sku_id = 'SKU-991')) """).result() CODE_BLOCK: Orchestrator β†’ "Reorder 50 units from Vendor A at $4.20/unit, ETA 3 days. Current stock: 8 units (below reorder threshold of 15)." βœ… CODE_BLOCK: Orchestrator β†’ "Reorder 50 units from Vendor A at $4.20/unit, ETA 3 days. Current stock: 8 units (below reorder threshold of 15)." βœ… CODE_BLOCK: Orchestrator β†’ "Reorder 50 units from Vendor A at $4.20/unit, ETA 3 days. Current stock: 8 units (below reorder threshold of 15)." βœ… CODE_BLOCK: CREATE TABLE retail.agent_decision_log ( request_id STRING, ts TIMESTAMP, agent_called STRING, tools_used ARRAY<STRING>, input_payload JSON, output_payload JSON, latency_ms INT64, confidence FLOAT64 ); CODE_BLOCK: CREATE TABLE retail.agent_decision_log ( request_id STRING, ts TIMESTAMP, agent_called STRING, tools_used ARRAY<STRING>, input_payload JSON, output_payload JSON, latency_ms INT64, confidence FLOAT64 ); CODE_BLOCK: CREATE TABLE retail.agent_decision_log ( request_id STRING, ts TIMESTAMP, agent_called STRING, tools_used ARRAY<STRING>, input_payload JSON, output_payload JSON, latency_ms INT64, confidence FLOAT64 ); COMMAND_BLOCK: tools = [ VertexAISearchTool(index="retail_policy_index"), VertexAISearchTool(index="retail_product_index"), BigQueryTool(query_template=INVENTORY_QUERY) # live call, not indexed ] COMMAND_BLOCK: tools = [ VertexAISearchTool(index="retail_policy_index"), VertexAISearchTool(index="retail_product_index"), BigQueryTool(query_template=INVENTORY_QUERY) # live call, not indexed ] COMMAND_BLOCK: tools = [ VertexAISearchTool(index="retail_policy_index"), VertexAISearchTool(index="retail_product_index"), BigQueryTool(query_template=INVENTORY_QUERY) # live call, not indexed ] CODE_BLOCK: Agent Plan: Sub-query A β†’ Policy Index: "online purchase battery return policy in-store" Sub-query B β†’ BigQuery Tool: SELECT units_available FROM inventory_snapshot WHERE sku_id='SKU-4471' AND store='GA-CUMMING' CODE_BLOCK: Agent Plan: Sub-query A β†’ Policy Index: "online purchase battery return policy in-store" Sub-query B β†’ BigQuery Tool: SELECT units_available FROM inventory_snapshot WHERE sku_id='SKU-4471' AND store='GA-CUMMING' CODE_BLOCK: Agent Plan: Sub-query A β†’ Policy Index: "online purchase battery return policy in-store" Sub-query B β†’ BigQuery Tool: SELECT units_available FROM inventory_snapshot WHERE sku_id='SKU-4471' AND store='GA-CUMMING' CODE_BLOCK: "Yes β€” online purchases can be returned in-store within 90 days (Policy Β§3.2). The 40V battery (SKU-4471) shows 3 units in stock at Cumming, GA as of 14:07 EST today." CODE_BLOCK: "Yes β€” online purchases can be returned in-store within 90 days (Policy Β§3.2). The 40V battery (SKU-4471) shows 3 units in stock at Cumming, GA as of 14:07 EST today." CODE_BLOCK: "Yes β€” online purchases can be returned in-store within 90 days (Policy Β§3.2). The 40V battery (SKU-4471) shows 3 units in stock at Cumming, GA as of 14:07 EST today." COMMAND_BLOCK: MAX_RETRIES = 3 for attempt in range(MAX_RETRIES): result = vertex_search.retrieve(query, index=index_id) if result.confidence_score >= THRESHOLD: return result # Reformulate: broaden scope, try synonyms, switch retrieval mode query = agent.reformulate(query, attempt) # After max retries: escalate to human agent queue escalate_to_human(original_query) COMMAND_BLOCK: MAX_RETRIES = 3 for attempt in range(MAX_RETRIES): result = vertex_search.retrieve(query, index=index_id) if result.confidence_score >= THRESHOLD: return result # Reformulate: broaden scope, try synonyms, switch retrieval mode query = agent.reformulate(query, attempt) # After max retries: escalate to human agent queue escalate_to_human(original_query) COMMAND_BLOCK: MAX_RETRIES = 3 for attempt in range(MAX_RETRIES): result = vertex_search.retrieve(query, index=index_id) if result.confidence_score >= THRESHOLD: return result # Reformulate: broaden scope, try synonyms, switch retrieval mode query = agent.reformulate(query, attempt) # After max retries: escalate to human agent queue escalate_to_human(original_query) CODE_BLOCK: 1. Customer browses β†’ Pub/Sub event β†’ Personalization Engine surfaces relevant products (Layer 1) 2. Inventory drops below threshold β†’ Pub/Sub alert β†’ Orchestrator Agent dispatches reorder across 3 specialist agents in parallel (Layer 2) 3. Customer asks: "Is this in stock?" β†’ Customer Agent (Layer 2) β†’ Agentic RAG (Layer 3) queries BigQuery live + policy index β†’ grounded, cited answer in < 2s 4. All events β†’ BigQuery agent_decision_log + interaction_log β†’ weekly eval reports + model retraining for Layers 1 & 3 CODE_BLOCK: 1. Customer browses β†’ Pub/Sub event β†’ Personalization Engine surfaces relevant products (Layer 1) 2. Inventory drops below threshold β†’ Pub/Sub alert β†’ Orchestrator Agent dispatches reorder across 3 specialist agents in parallel (Layer 2) 3. Customer asks: "Is this in stock?" β†’ Customer Agent (Layer 2) β†’ Agentic RAG (Layer 3) queries BigQuery live + policy index β†’ grounded, cited answer in < 2s 4. All events β†’ BigQuery agent_decision_log + interaction_log β†’ weekly eval reports + model retraining for Layers 1 & 3 CODE_BLOCK: 1. Customer browses β†’ Pub/Sub event β†’ Personalization Engine surfaces relevant products (Layer 1) 2. Inventory drops below threshold β†’ Pub/Sub alert β†’ Orchestrator Agent dispatches reorder across 3 specialist agents in parallel (Layer 2) 3. Customer asks: "Is this in stock?" β†’ Customer Agent (Layer 2) β†’ Agentic RAG (Layer 3) queries BigQuery live + policy index β†’ grounded, cited answer in < 2s 4. All events β†’ BigQuery agent_decision_log + interaction_log β†’ weekly eval reports + model retraining for Layers 1 & 3 - Session feature update β†’ Vertex AI Feature Store (< 5s latency) - Interaction log β†’ BigQuery (for offline model training) - Current inventory level - Promotional pricing flag - User's price sensitivity segment - Real-time trend score - Loose coupling: agents have no direct dependency on each other, only on topic names - Fault tolerance: if an agent crashes, the message is retained and redelivered on recovery - Independent scaling: each Cloud Run agent scales on its own Pub/Sub queue depth - A single customer question often spans multiple knowledge domains (policy + inventory + product specs) - Inventory data goes stale in minutes β€” you cannot index it as static documents - Retrieval confidence varies β€” a system that cannot detect low-confidence answers will hallucinate - Set up BigQuery tables: inventory_snapshot, interaction_log, agent_decision_log - Stand up Pub/Sub topics and Dataflow streaming job - This infrastructure is shared by all three layers β€” do it once, use it everywhere - Train a two-tower model on BigQuery interaction history - Index item embeddings into Vertex AI Matching Engine - Wire up Cloud Run serving API - Measure: recommendation CTR vs. batch baseline - Start with two agents: Inventory + Pricing - Orchestrator on Vertex AI Reasoning Engine - Add Supplier Agent once the first two are stable - Index return policy + product catalog into Vertex AI Search - Wire the BigQuery inventory tool into the agent - Deploy as the Customer Agent inside your multi-agent system - Share infrastructure, not code. BigQuery and Pub/Sub serve all three layers. Build them once. - The Customer Agent IS Agentic RAG. Don't build these as separate projects. - The agent_decision_log is your most valuable table. It is your audit trail, your eval dataset, and your retraining signal. - Personalization cold start is solved by context, not history. Device + time + location gets you 80% of the way there for new users. - Hybrid retrieval beats pure vector search for retail. BM25 handles part numbers and model codes that semantic search misses.