Building a RAG based agent using DronaHQ

Building a RAG based agent using DronaHQ

Source: Dev.to

What is retrieval augmented generation (RAG) ## What agentic RAG adds on top of RAG ## What agentic RAG realistically looks like in business operations ## Building a RAG-based agent for business ops - Real life use case ## Problem statement ## Alternatives considered and why they failed ## The solution built ## How the solution works conceptually ## How I built a RAG-based agent without coding ## Steps involved in a real request ## Mapping RAG concepts to the agent accurately ## Where agentic behavior comes in ## Closing thoughts RAG and agentic RAG are often discussed at a high level, which makes them sound either overly academic or unrealistically autonomous. In practice, most useful systems sit somewhere in between. This post breaks the topic down clearly. What RAG actually is, what agentic RAG adds on top, what agentic RAG looks like in real business operations, and how I built a RAG-based agent for internal ops without writing any code. This RAG agent is based on an actual implementation by me for our marketing-sales enablement function. If you'd like to exchange ideas, lets connect! Retrieval augmented generation is a pattern where a language model generates responses using external data retrieved at runtime, instead of relying only on its training knowledge. At a minimum, a RAG system has three parts: The key property of RAG is grounding. The model is constrained by what it retrieves. If the information does not exist in the corpus, the system should either say it does not know or ask for clarification. In business settings, RAG is most valuable when accuracy matters more than creativity. Internal documentation, customer stories, policies, analytics, and transcripts are all natural fits because they are bounded and auditable. Agentic RAG builds on the same retrieval foundation, but adds decision-making and iteration. Instead of a single retrieve and generate step, an agentic RAG system can plan its work, adapt retrieval based on intermediate results, verify outputs, and take actions across tools. Retrieval still grounds the system, but the agent decides how and when retrieval happens. Important distinction. Agentic RAG does not require full autonomy or open ended reasoning. In most enterprise use cases, agentic behavior is narrow, intentional, and bounded. Planning, verification, and clarification loops matter more than long reasoning chains. A useful way to think about it is this. RAG answers questions. Agentic RAG completes tasks. In practice (so far), agentic RAG systems do not behave like general assistants. They behave like focused workers with a clear job. Think about a vendor renewal in a finance or operations team. The information needed to make a decision lives in many places. Contracts in shared drives. Usage data in dashboards. Email threads where exceptions were discussed. A basic RAG system can answer a question like “What does the contract say about renewal?” by retrieving and summarizing a clause. An agentic RAG system does more. When asked to prepare a renewal summary, it first figures out what evidence is required. It retrieves renewal terms from the contract, pulls recent usage metrics, and searches for past exception notes. If any piece is missing or contradictory, it flags that instead of guessing. Only then does it generate a structured summary that can be reviewed or shared. The key difference is not the output format. It is the planning, targeted retrieval, and verification before writing. That pattern is what agentic RAG looks like in real business operations. Customer stories were valuable, but hard to use. Information about a single customer lived across many places. A published blog might capture the high level narrative. Internal documents added implementation detail. Transcripts from customer story videos contained the strongest proof points and quotes. Slides and notes added yet another layer. Any time someone needed a short customer bite, a few bullets for a deck, or a quote focused on a specific theme like integrations, the work was manual. Someone had to search across sources, reconcile overlaps, decide what was current, and rewrite everything for the new context. The cost was not just time. It was inconsistency, outdated facts, and repeated rework. The first and default option was manual curation. Ad hoc requests to create custom one pagers, a bite for an email, or a slide for a deck. ChatGPT and NotebookLM produced fluent answers, but accuracy was unreliable. They mixed customers, invented quotes, and blurred timelines. For customer stories, that risk was unacceptable. Each alternative either did not scale or compromised trust. The solution was a RAG based agent scoped specifically to customer stories. Instead of trying to know everything, the agent retrieves only verified, customer specific material at runtime and generates outputs strictly from that context. It does not rely on general model knowledge for facts or quotes. Over time, this evolved into an agentic RAG system by adding planning, verification, and multi step behavior. At a high level, the agent treats customer stories as evidence, not prompts. When a request comes in, it first identifies which customer the request refers to and what outputs are required. It then decides which sources are appropriate for each output. Narrative summaries come from blogs or internal docs. Quotes come from transcripts. Metrics come from outcome summaries. The agent retrieves these pieces separately, assembles them into a working context, and only then generates the final output. Nothing is written before retrieval. Nothing is generated without grounding. I used DronaHQ’s Agentic Platform to build this agent. Resources: The first step was defining what the agent is allowed to know. I added only approved, existing resources that already reflect how we work. This included documents, transcripts, long form pages, and links. These sources live in different formats, but together they represent the full context the agent needs. Tip: Do not dump all resources into the system. Multiple sources of truth can confuse the agent. The goal is to give it the right data and nothing else. Instructions: Instructions were the most important part of the build. I wrote instructions that explained what the agent is responsible for, how it should interpret vague requests, and what it must never do. I was explicit about avoiding overlap between sources and about asking follow up questions when the request is underspecified. Most of the iteration happened here. Small changes in instruction quality had a much bigger impact than changing tools or models. LLM model: I selected the GPT 5 model and tuned it for lower creativity. The agent’s job is not to be clever. It is to be accurate and consistent. In this case, model choice mattered less than expected. Once the constraints and instructions were solid, the behavior became predictable regardless of model tweaks. AI tools: I connected only the tools needed to produce outputs in formats people already use. This included document generation and Google Slides so the agent’s responses could immediately fit into existing tasks. Testing: Testing happened continuously. I used the playground within DronaHQ to run real, messy prompts. I deliberately tested vague requests, edge cases, and scenarios where multiple sources could apply. Whenever the output drifted or overlapped, I refined the instructions instead of adding complexity. This tight feedback loop made it easy to improve behavior without rewriting anything. Prompt: “Give me a 100 to 150 word bite on customer XYZ, add three to four bullet points, and include two quote options where the customer talks about integrations.” The agent limits its search to artifacts tagged to customer XYZ. It identifies three output sections and maps each to a source type. Blogs or docs for the bite and bullets. Transcripts for quotes about integrations. It pulls the most relevant story sections and transcript segments that mention integrations. Retrieved content is grouped by purpose so summaries and quotes do not bleed into each other. The agent writes the bite and bullets using only retrieved story material and generates quote options derived from transcript language. If the retrieved material is insufficient or contradictory, the agent does not guess. It either asks for clarification or limits the output. This sequence is what turns a scattered knowledge base into reliable, reusable customer storytelling. The corpus is the full set of customer story artifacts. Blogs, internal write ups, outcome summaries, transcripts, and tagged notes form the grounding data. The agent is not allowed to answer beyond this material. Ingestion brings these artifacts into the system in a retrievable form. Stories are split into logical sections such as problem, solution, integrations, scale, and outcomes. Transcripts are chunked into conversational segments. Retrieval happens at runtime. When a user mentions a specific customer, the agent scopes retrieval to that customer only. When the user asks for integration related quotes, the retriever pulls transcript segments that semantically match integration discussions. Context assembly separates sources by purpose. Blog and document content is used to generate narrative bites and bullets. Transcript content is used to generate quote options. Generation is constrained. The model is instructed to rely only on retrieved context. If the context is insufficient, the agent asks a clarifying question or declines to invent. The first agentic layer is planning. Before retrieving, the agent decomposes the task into outputs and evidence needs. This determines which sources to query and how deeply. The second layer is multi pass retrieval. The agent retrieves broadly first, then runs targeted retrieval for gaps such as integrations, metrics, or quotes. The third layer is verification. The agent builds an internal evidence table mapping each claim to a source. Conflicts across sources are flagged or resolved using predefined rules. Quote handling is agentic as well. The agent first extracts verbatim transcript lines related to the requested theme, then selects the strongest candidates. Cleaned versions are generated only after selection. Slides introduce additional decision making. The agent chooses a narrative structure, retrieves proof points per slide, generates titles and speaker notes, and adapts the deck to the intended audience. A final quality check loop evaluates grounding, completeness, and format compliance. If the output falls short, the agent revises using additional retrieval. If you are exploring RAG, start with a workflow where the cost of being wrong is obvious. Customer stories, policies, support runbooks, vendor renewals, and internal SOPs are all good candidates because the source material already exists and the outputs are used by real teams. A basic RAG system will get you faster answers. The moment you need consistent outputs that hold up under review, you will start adding agentic layers. Planning so the system knows what to fetch. Multi pass retrieval so it can fill gaps. Verification so it can flag contradictions. Clarifying questions so it stops guessing. Those upgrades are what move the system from “helpful” to dependable. In my case, the shift was practical. The goal was not an autonomous agent. The goal was to stop redoing the same customer story work every week, and to make sure the outputs stayed grounded in what the customer actually said and what we actually shipped. If you are building something similar, the biggest unlock is a combination of your stack, how you structure your resources, and the guardrails you add for the agent to overcome what other AI chatbots could not. If you want to see the agent in action, the video above shows a real request end-to-end. If you are building your own RAG or agentic RAG workflow, I would love to hear what corpus you are grounding it on and where it breaks today. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - A corpus of source material that represents ground truth. - A retriever that selects relevant pieces of that corpus based on a user query. - And a generator that produces an answer using only the retrieved context. - Resources: The first step was defining what the agent is allowed to know. I added only approved, existing resources that already reflect how we work. This included documents, transcripts, long form pages, and links. These sources live in different formats, but together they represent the full context the agent needs. Tip: Do not dump all resources into the system. Multiple sources of truth can confuse the agent. The goal is to give it the right data and nothing else. - Instructions: Instructions were the most important part of the build. I wrote instructions that explained what the agent is responsible for, how it should interpret vague requests, and what it must never do. I was explicit about avoiding overlap between sources and about asking follow up questions when the request is underspecified. Most of the iteration happened here. Small changes in instruction quality had a much bigger impact than changing tools or models. - LLM model: I selected the GPT 5 model and tuned it for lower creativity. The agent’s job is not to be clever. It is to be accurate and consistent. In this case, model choice mattered less than expected. Once the constraints and instructions were solid, the behavior became predictable regardless of model tweaks. - AI tools: I connected only the tools needed to produce outputs in formats people already use. This included document generation and Google Slides so the agent’s responses could immediately fit into existing tasks. - Testing: Testing happened continuously. I used the playground within DronaHQ to run real, messy prompts. I deliberately tested vague requests, edge cases, and scenarios where multiple sources could apply. Whenever the output drifted or overlapped, I refined the instructions instead of adding complexity. - The agent limits its search to artifacts tagged to customer XYZ. - It identifies three output sections and maps each to a source type. Blogs or docs for the bite and bullets. Transcripts for quotes about integrations. - It pulls the most relevant story sections and transcript segments that mention integrations. - Retrieved content is grouped by purpose so summaries and quotes do not bleed into each other. - The agent writes the bite and bullets using only retrieved story material and generates quote options derived from transcript language.