Tools

Tools: Building AI-native backends – RAG pipelines, function calling, prompt versioning, LLM observability

2026-01-28 0 views admin

Tools: Building AI-native backends – RAG pipelines, function calling, prompt versioning, LLM observability

Source: Dev.to

The Reality of AI Backends ## RAG Is Not “Chunk, Embed, Query” ## Ingestion in Laravel ## Hybrid Retrieval Is Mandatory ## Generation With Grounded Context ## Function Calling Without Guardrails Will Burn You ## Prompts Are Code ## Observability Is Not Optional ## Caching and Cost Control ## What Actually Prevents Incidents ## Final Takeaway Two months ago, our internal knowledge base chatbot confidently told a support rep that our refund policy was “14 days, no questions asked.” Our real policy is 30 days with approval for larger amounts. A $2,000 refund was processed based on that hallucination. That was the moment we stopped treating LLM features like “smart text boxes” and started treating them like unreliable distributed systems that require real engineering. This article is not about demos. It’s about what you have to build after the demo works. Traditional backends are deterministic. Same input → same output. AI backends are probabilistic. Same input → slightly different output depending on context, model variance, and prompt structure. A production AI backend ends up looking like this: API │ AI Orchestrator ├─ Guardrails ├─ Router ├─ Rate limits │ ├─ RAG pipeline ├─ Function execution └─ Direct generation │ Observability + Evals If you skip any of these layers, you will eventually ship a hallucination that costs money. The tutorial version of RAG is: split text → embed → vector search → pass to LLM That works in a notebook. It fails in production. Ingestion is a queued job, not a script. You re see documents constantly. You re-embed only what changed. The biggest quality improvement you will see is semantic chunking instead of fixed token splits. Vector search misses exact matches like order IDs, SKUs, emails. Keyword search misses meaning. Most hallucinations in RAG systems are actually retrieval failures, not model failures. What you pass to the model matters more than the model. Low temperature. Structured output. Explicit context. You are trying to reduce creativity, not increase it. Letting an LLM trigger backend actions without controls is equivalent to letting users hit internal APIs directly. Every tool call must go through: Refunds, account changes, billing operations — these must never be “just a function call.” Prompts change behavior more than code does. Never hardcode prompts in PHP files. You will want to change them without redeploying. You need to log, trace, and evaluate: Without this, you cannot debug hallucinations. You also need automated evaluations that periodically ask: “Is this answer actually grounded in the provided context?” That’s how you catch issues before users do. LLM calls are expensive and slow. Cache deterministic calls by hashing inputs. Track cost daily and hard-stop if you exceed budget. After enough production incidents, you realize the real safeguards are: Not model choice. Not fancy agents. Not frameworks. Just engineering discipline applied to a probabilistic system. A demo AI feature looks like magic. A production AI system looks like a paranoid, over-engineered backend. And that’s exactly what it needs to be. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: class IngestDocuments { public function handle(SourceInterface $source) { $documents = $source->fetch(); foreach ($documents as $doc) { $hash = sha1($doc->content); if (Cache::get("doc_hash_{$doc->id}") === $hash) { continue; } $chunks = (new SemanticChunker())->chunk($doc->content); $embeddings = app(EmbeddingService::class)->embed($chunks); app(VectorStore::class)->upsert($doc->id, $chunks, $embeddings); Cache::put("doc_hash_{$doc->id}", $hash, now()->addDay()); } } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: class IngestDocuments { public function handle(SourceInterface $source) { $documents = $source->fetch(); foreach ($documents as $doc) { $hash = sha1($doc->content); if (Cache::get("doc_hash_{$doc->id}") === $hash) { continue; } $chunks = (new SemanticChunker())->chunk($doc->content); $embeddings = app(EmbeddingService::class)->embed($chunks); app(VectorStore::class)->upsert($doc->id, $chunks, $embeddings); Cache::put("doc_hash_{$doc->id}", $hash, now()->addDay()); } } } CODE_BLOCK: class IngestDocuments { public function handle(SourceInterface $source) { $documents = $source->fetch(); foreach ($documents as $doc) { $hash = sha1($doc->content); if (Cache::get("doc_hash_{$doc->id}") === $hash) { continue; } $chunks = (new SemanticChunker())->chunk($doc->content); $embeddings = app(EmbeddingService::class)->embed($chunks); app(VectorStore::class)->upsert($doc->id, $chunks, $embeddings); Cache::put("doc_hash_{$doc->id}", $hash, now()->addDay()); } } } CODE_BLOCK: class HybridRetriever { public function search(string $query, int $limit = 8) { $vector = app(VectorStore::class)->search($query, $limit * 2); $keyword = app(KeywordSearch::class)->search($query, $limit * 2); return $this->mergeAndRank($vector, $keyword, $limit); } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: class HybridRetriever { public function search(string $query, int $limit = 8) { $vector = app(VectorStore::class)->search($query, $limit * 2); $keyword = app(KeywordSearch::class)->search($query, $limit * 2); return $this->mergeAndRank($vector, $keyword, $limit); } } CODE_BLOCK: class HybridRetriever { public function search(string $query, int $limit = 8) { $vector = app(VectorStore::class)->search($query, $limit * 2); $keyword = app(KeywordSearch::class)->search($query, $limit * 2); return $this->mergeAndRank($vector, $keyword, $limit); } } COMMAND_BLOCK: class RagResponder { public function answer(string $question, array $chunks) { $context = collect($chunks) ->pluck('content') ->join("\n\n"); $prompt = Prompt::load('rag-answer', 'v3'); $response = app(LLM::class)->chat([ ['role' => 'system', 'content' => $prompt->system], ['role' => 'user', 'content' => $prompt->fill([ 'context' => $context, 'question' => $question, ])], ], temperature: 0.2, json: true); return $response; } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: class RagResponder { public function answer(string $question, array $chunks) { $context = collect($chunks) ->pluck('content') ->join("\n\n"); $prompt = Prompt::load('rag-answer', 'v3'); $response = app(LLM::class)->chat([ ['role' => 'system', 'content' => $prompt->system], ['role' => 'user', 'content' => $prompt->fill([ 'context' => $context, 'question' => $question, ])], ], temperature: 0.2, json: true); return $response; } } COMMAND_BLOCK: class RagResponder { public function answer(string $question, array $chunks) { $context = collect($chunks) ->pluck('content') ->join("\n\n"); $prompt = Prompt::load('rag-answer', 'v3'); $response = app(LLM::class)->chat([ ['role' => 'system', 'content' => $prompt->system], ['role' => 'user', 'content' => $prompt->fill([ 'context' => $context, 'question' => $question, ])], ], temperature: 0.2, json: true); return $response; } } COMMAND_BLOCK: class ToolExecutor { public function execute(string $tool, array $args, User $user) { $definition = ToolRegistry::get($tool); Gate::authorize($definition->ability, $user); if ($definition->needsApproval && !$user->isAdmin()) { throw new AuthorizationException(); } RateLimiter::hit("tool:{$tool}", 60); $result = call_user_func($definition->handler, $args); AuditLog::create([ 'user_id' => $user->id, 'tool' => $tool, 'args' => $args, 'result' => $result, ]); return $result; } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: class ToolExecutor { public function execute(string $tool, array $args, User $user) { $definition = ToolRegistry::get($tool); Gate::authorize($definition->ability, $user); if ($definition->needsApproval && !$user->isAdmin()) { throw new AuthorizationException(); } RateLimiter::hit("tool:{$tool}", 60); $result = call_user_func($definition->handler, $args); AuditLog::create([ 'user_id' => $user->id, 'tool' => $tool, 'args' => $args, 'result' => $result, ]); return $result; } } COMMAND_BLOCK: class ToolExecutor { public function execute(string $tool, array $args, User $user) { $definition = ToolRegistry::get($tool); Gate::authorize($definition->ability, $user); if ($definition->needsApproval && !$user->isAdmin()) { throw new AuthorizationException(); } RateLimiter::hit("tool:{$tool}", 60); $result = call_user_func($definition->handler, $args); AuditLog::create([ 'user_id' => $user->id, 'tool' => $tool, 'args' => $args, 'result' => $result, ]); return $result; } } COMMAND_BLOCK: class Prompt extends Model { protected $casts = ['variables' => 'array']; } class PromptManager { public static function load(string $name, string $version): Prompt { return Prompt::where(compact('name', 'version'))->firstOrFail(); } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: class Prompt extends Model { protected $casts = ['variables' => 'array']; } class PromptManager { public static function load(string $name, string $version): Prompt { return Prompt::where(compact('name', 'version'))->firstOrFail(); } } COMMAND_BLOCK: class Prompt extends Model { protected $casts = ['variables' => 'array']; } class PromptManager { public static function load(string $name, string $version): Prompt { return Prompt::where(compact('name', 'version'))->firstOrFail(); } } CODE_BLOCK: class CachedLLM { public function chat(array $payload) { $key = hash('sha256', json_encode($payload)); return Cache::remember($key, 3600, fn () => app(LLM::class)->chat($payload) ); } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: class CachedLLM { public function chat(array $payload) { $key = hash('sha256', json_encode($payload)); return Cache::remember($key, 3600, fn () => app(LLM::class)->chat($payload) ); } } CODE_BLOCK: class CachedLLM { public function chat(array $payload) { $key = hash('sha256', json_encode($payload)); return Cache::remember($key, 3600, fn () => app(LLM::class)->chat($payload) ); } } - You cannot trust outputs - You cannot trust retrieval - You cannot trust prompts - You cannot trust tool calls - You must observe everything - Proper ingestion pipeline - Semantic chunking - Change detection - Hybrid retrieval (vector + keyword) - Ongoing evaluation of retrieval quality - Authorization - Rate limiting - Audit logging - Optional approval - Rolled out gradually - The user query - Retrieved chunks - Final prompt sent - Model output - Tokens and latency - Hybrid retrieval - Strict prompts - Guarded tool execution - Full tracing - Automated evals - Aggressive caching - A demo AI feature looks like magic. - A production AI system looks like a paranoid, over-engineered backend.

🏷️ Tags

how-totutorialguidedev.toaillmrouter