Tools: Designing Agentic AI Systems: How Real Applications Combine Patterns, Not Hype

Tools: Designing Agentic AI Systems: How Real Applications Combine Patterns, Not Hype

Source: Dev.to

The Foundation: Two Operating Models of AI Systems ## 1. Agentic Workflows Intelligence Inside Deterministic Systems ## 2. Autonomous Agents Goal Driven Adaptive Systems ## Why This Distinction Matters A Clear Engineering Explanation ## The core idea: ## Failure Modes How Things Break ## Autonomous Agents ## Testing Strategy How You Validate Systems Workflows ## Agents ## Observability What You Need to Monitor Workflows ## Agents ## Governance and Safety How You Control Risk Workflows ## Agents ## Determinism vs Adaptability The Tradeoff Workflows optimize for: ## Best for: ## Agents optimize for: ## Best for: ## Mental Model (Simple) ## Real Enterprise Impact ## Foundational Capabilities Across All Patterns ## A2A (Agent-to-Agent Communication) ## Memory Layers ## Pattern 1: Augmented LLM ## A plain LLM has three built in limits: ## Human equivalent ## Pattern 2 Durable Agent ## But real workflows ## Typical engines: ## Human equivalent ## Pattern 3 Prompt Chaining ## This improves: ## Human equivalent ## Key design notes ## Pattern 4 Evaluator & Optimizer ## Introduce a feedback loop ## Human equivalent ## Key design notes ## Pattern 5 Autonomous Agent ## The model controls its own loop ## Human equivalent ## Key design notes ## Pattern 6 Parallelization ## Human equivalent ## Key design notes ## Pattern 7 Routing ## Pattern 8 Orchestrator & Workers ## Human equivalent ## Key design notes ## How These Patterns Come Together in Real Systems ## Where to Begin ## In practice, the most effective approach is evolutionary: Most explanations of AI agent patterns are either too abstract to be useful or too simplified to be accurate.This guide attempts to be both technically precise and genuinely easy to understand by grounding each pattern in a human behavior most engineers, architects, and product leaders already know well. Before discussing agent patterns, we need to establish a distinction that quietly determines almost every architectural decision you will make. Not all AI systems operate the same way. In practice, modern LLM systems fall into two operating models defined by where control lives. Understanding this boundary is essential because it shapes reliability, safety, observability, testing strategy, and governance. In an agentic workflow, the system is fundamentally code driven. The LLM is invoked at specific points to perform bounded tasks such as interpretation, generation, classification, or reasoning but it operates within a structure defined by deterministic software. The execution path is known ahead of time. The system behaves like a controlled pipeline augmented with probabilistic intelligence. You can think of this as: A deterministic system that calls an LLM as a capability. This model aligns with how most production AI systems are built today including RAG pipelines, prompt chains, tool augmented services, and orchestrated workflows. In an autonomous agent, control shifts. Instead of code prescribing each step, the system provides: Execution emerges dynamically through an iterative loop often described in literature as Reason → Act → Observe (ReAct). There is no predefined sequence beyond high level boundaries. You can think of this as: A goal driven system where the model determines the workflow at runtime. This approach appears in research agents, exploration systems, coding agents, investigative assistants, and adaptive planning environments. Choosing between an agentic workflow and an autonomous agent changes how you design reliability, testing, monitoring, and governance. 👉 If code controls the flow, you manage risk through software engineering. 👉 If the model controls decisions, you manage risk through evaluation and guardrails. Where control sits defines where problems appear. Example: A RAG pipeline returns wrong documents → answer is wrong. Root cause is traceable. Think of it like this: System Analogy Agentic workflow Train on tracks Autonomous agent Explorer in wilderness Train = safe, predictable. Explorer = powerful, uncertain. This decision affects: Many teams underestimate this and get surprised later. Workflows reduce uncertainty by design. Agents embrace uncertainty to gain capability. Before diving into individual patterns, modern agentic systems rely on a set of shared primitives: Mechanisms that allow models to interact with systems like APIs, databases, workflows, messaging, code execution. Tools turn reasoning into action Mechanisms for agents to collaborate, delegate, and exchange results critical for multi agent systems and orchestrations. STM (Short Term Memory) Session context — conversation history, current task state. LTM (Long Term Memory) Persistent knowledge user preferences, historical interactions, embeddings, knowledge graphs. What it is (technical) The Augmented LLM pattern fixes this by equipping the model at runtime with Retrieval (RAG): Pull relevant documents/records and inject them into context before answering. Tools: Let the model call functions (APIs, DB queries, calculators, code execution). Memory: Persist useful context across turns/sessions (STM in the window; LTM in external storage like vector DB / KG / profile store). A specialist (doctor/lawyer/analyst) isn’t powerful because of “brain only.” They’re powerful because they have: Augmented LLM is that same upgrade: a model with a desk, not a model in isolation. What it is (technical) Most LLM interactions are short lived seconds or minutes. A Durable Agent wraps an AI system in a persistent execution layer that A loan approval process. It doesn’t restart because someone went on vacation it resumes exactly where it paused. What it is (technical) A complex task is broken into sequential steps. Each step: What it is (technical) Writer and editor iterating drafts. Define clear evaluation rubric Watch for evaluator bias What it is (technical) Detective following leads. Enforce action budgets Require approval for risky actions What it is (technical) What it is (technical) A classifier directs requests to specialized handlers. What it is (technical) A coordinator decomposes tasks and assigns them to specialists. These patterns aren’t competing approaches they’re building blocks. In production, they’re layered deliberately, each solving a different class of problem. Take a contract review system for a legal team. A routing layer sits at the front, classifying incoming documents NDA, employment agreement, vendor contract, regulatory filing and directing each to the appropriate processing path. Behind that, each path runs as a prompt chain: one step extracts clauses and metadata, another compares them against standard templates, and a third generates a risk summary. Between steps, code validates outputs to prevent errors from propagating. When agreements become complex for example, multi party contracts the workflow invokes an orchestrator workers pattern. Specialized workers analyze indemnification, jurisdiction, termination rights, and other domains independently, and their findings are synthesized into a unified assessment. Every model call operates as an augmented LLM, grounded with retrieval from contract libraries and connected to internal systems through tools. Before results are delivered, an evaluator optimizer loop checks the output against defined quality criteria ensuring completeness, correctness, and appropriate risk classification. All of this runs within a durable execution layer. If partner review is required, the system pauses, waits, and resumes later without losing state or restarting the process. One system. Multiple patterns. Each contributing a specific capability the others don’t provide. A common mistake in agentic system design is starting with the most sophisticated pattern instead of the most appropriate one. Autonomous agents are compelling in demos, but in production they introduce governance, observability, and reliability challenges that many teams underestimate. You don’t need all patterns. In fact, most systems shouldn’t use all of them. The real goal is simpler: apply the smallest set of patterns that delivers reliability, clarity, and operational confidence for the problem you’re solving. Thanks Sreeni Ramadorai Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Engineers define: - The sequence of steps - Branching logic - Failure handling - Termination conditions - A set of tools - Constraints or policies - An environment to observe - The LLM then decides: - What action to take - Which tool to use - How to interpret outcomes - When to continue or stop - Failures usually come from traditional engineering issues: - Missing logic branches - Incorrect orchestration - Bad retrieval results - API failures - Integration bugs - Incorrect assumptions coded into flow - Failures come from cognitive behavior: - Model misunderstands goal - Takes unnecessary actions - Gets stuck in loops - Hallucinates tool usage - Makes unsafe decisions - Drifts from original objective Example: Agent keeps calling tools repeatedly trying to “improve” answer. Root cause is emergent. - You can test like traditional software: - Integration tests - Regression tests - Deterministic scenarios - Same input → same path. - You test like behavioral systems: - Simulation environments - Evaluation datasets - Adversarial testing - Monte Carlo runs running the agent many times with slight variations or randomness to observe behavior across scenarios and uncover edge cases - Human review - Same input may produce different actions. - Logs are enough: - Step execution - API responses - You follow the pipeline. - You need deeper insight: - Reasoning traces - Decision trees - Memory state - Goal progress - Action outcomes - You monitor behavior, not just execution. - You enforce rules in code: - Hard guardrails - Approval steps - Validation checks - Compliance rules - System cannot deviate. - You enforce policies around behavior: - Tool permissions - Budget limits - Action constraints - Kill switches - Human oversight - Policy engines - System can explore — within boundaries. - Predictability - Repeatability - Reliability - Auditability - Exploration - Problem solving - Ambiguity handling - Learning-like behavior - Coding assistants - Investigations - Architecture complexity - Cost control - Production stability - Incident response - Compliance posture - Operational maturity - Frozen knowledge (training time only) - No durable memory (unless you provide it) - No actions (it only generates text) - the client file (retrieval), - live systems (tools), - and prior notes (memory). - Retrieval quality is the ceiling. Garbage context → confident wrong answers. - Tool schemas must be crystal-clear. Ambiguous tools create silent, hard-to-debug failures. - Span days or weeks - Require approvals - Survive failures - Need audit trails - Checkpoints state after each step - Supports pause/resume - Retries safely - Tracks full history - Durable Functions - Step Functions - Workflow engines - Idempotency is critical (avoid duplicate actions) - Plan schema evolution early - Track execution lineage for auditability - Performs a focused task - Produces structured output - Is validated before moving forward - Reliability - Observability - Factory assembly line. - Each station does one job not everything. - Prevent error propagation with validation - Keep step outputs structured - Avoid passing unnecessary context - Generate output - Evaluate against criteria - Improve based on feedback - Repeat until acceptable - Writer and editor iterating drafts. Key design notes - Define clear evaluation rubric - Limit iterations - Watch for evaluator bias - Decide next action - Update plan - There is no fixed path. - Detective following leads. Key design notes - Enforce action budgets - Require approval for risky actions - Log everything - Independent subtasks run concurrently. - Team dividing work. - Ensure independence - Design aggregation carefully - Watch cost spikes - Hospital triage nurse. - Measure routing accuracy - Define fallback path - Tune confidence thresholds - General contractor managing trades. - Define worker contracts - Detect conflicts - Avoid over fragmentation - Start with an augmented LLM so your system has the right context, tools, and grounding. - Introduce prompt chaining when tasks naturally break into sequential steps. - Add routing when different request types require different handling strategies. - Use parallelization when independent work can improve throughput. - Introduce evaluator loops when output quality must be consistently enforced. - Adopt orchestrator workers when problems require multiple specialized perspectives. - Wrap workflows in durable execution when processes span time or involve human checkpoints. - Explore autonomous agents selectively for open-ended subtasks — with clear limits and safeguards.